The Effect of Annotation Scheme Decisions on Parsing Learner Data

نویسندگان

  • Marwa Ragheb
  • Markus Dickinson
چکیده

We present a study on the dependency parsing of second language learner data, focusing less on the parsing techniques and more on the effect of the linguistic distinctions made in the data. In particular, we examine syntactic annotation that relies more on morphological form than on meaning. We see the effect of particular linguistic decisions by: 1) converting and transforming a training corpus with a similar annotation scheme, with transformations occurring either before or after parsing; 2) inputting different kinds of partof-speech (POS) information; and 3) analyzing the output. While we see a general favortism for parsing with more local dependency relations, this seems to be less the case for parsing the data of lower-level learners.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Structure Annotation and Parsing for Learner English

There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we s...

متن کامل

Inter-annotator Agreement for Dependency Annotation of Learner Language

This paper reports on a study of interannotator agreement (IAA) for a dependency annotation scheme designed for learner English. Reliably-annotated learner corpora are a necessary step for the development of POS tagging and parsing of learner language. In our study, three annotators marked several layers of annotation over different levels of learner texts, and they were able to obtain generall...

متن کامل

REALEC learner treebank: annotation principles and evaluation of automatic parsing

The paper presents a Universal Dependencies (UD) annotation scheme for a learner English corpus. The REALEC dataset consists of essays written in English by Russian-speaking university students in the course of general English. The original corpus is manually annotated for learners’ errors and gives information on the error span, error type, and the possible correction of the mistake provided b...

متن کامل

The effect of disfluencies and learner errors on the parsing of spoken learner language

NLP tools are typically trained on written data from native speakers. However, research into language acquisition and tools for language teaching & proficiency assessment would benefit from accurate processing of spoken data from second language learners. In this paper we discuss manual annotation schemes for various features of spoken language; we also evaluate the automatic tagging of one par...

متن کامل

Linguistic Issues in Language Technology – LiLT

Parsing learner data poses a great challenge for standard tools, since non-canonical and unusual structures may lead to wrong interpretations on the part of the taggers and parsers. It is well known that providing a statistical parser with perfect part-of-speech (POS) tags is of great benefit for parsing accuracy, and that parsing results can decrease considerably when the parser has to predict...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014